teacher signal
f6876a9f998f6472cc26708e27444456-AuthorFeedback.pdf
We thank all reviewers for their thoughtful comments. "The method is only compared to prior models with long-term memory on the [QA] task, and doesn't perform as " This is expected as these are ML models with non-biological Our goal was to show that simple local Hebbian plasticity can be utilized to solve many of these tasks. "Is it essential that the key-value Our goal was to show that simple local plasticity is sufficient for many tasks. "How and why do the query and storage keys "[...] isn't it possible to achieve good performance on the tasks in the paper This approach is rather close to the approach of MemN2N. "[...] it would be helpful to explain the practical or physiological relevance in more detail.
Review for NeurIPS paper: H-Mem: Harnessing synaptic plasticity with Hebbian Memory Networks
The motivation of the model is unclear. In other words, why can this model work on the two tasks? We cannot simply say it uses Hebbian rule which agrees with biological system then it should work. A reason, or intuition, from the perspective of machine learning should be provided. I want to see explanations on both tasks in the rebuttal.
Temporal Knowledge Sharing enable Spiking Neural Network Learning from Past and Future
Dong, Yiting, Zhao, Dongcheng, Zeng, Yi
Spiking Neural Networks (SNNs) have attracted significant attention from researchers across various domains due to their brain-like information processing mechanism. However, SNNs typically grapple with challenges such as extended time steps, low temporal information utilization, and the requirement for consistent time step between testing and training. These challenges render SNNs with high latency. Moreover, the constraint on time steps necessitates the retraining of the model for new deployments, reducing adaptability. To address these issues, this paper proposes a novel perspective, viewing the SNN as a temporal aggregation model. We introduce the Temporal Knowledge Sharing (TKS) method, facilitating information interact between different time points. TKS can be perceived as a form of temporal self-distillation. To validate the efficacy of TKS in information processing, we tested it on static datasets like CIFAR10, CIFAR100, ImageNet-1k, and neuromorphic datasets such as DVS-CIFAR10 and NCALTECH101. Experimental results demonstrate that our method achieves state-of-the-art performance compared to other algorithms. Furthermore, TKS addresses the temporal consistency challenge, endowing the model with superior temporal generalization capabilities. This allows the network to train with longer time steps and maintain high performance during testing with shorter time steps. Such an approach considerably accelerates the deployment of SNNs on edge devices. Finally, we conducted ablation experiments and tested TKS on fine-grained tasks, with results showcasing TKS's enhanced capability to process information efficiently.
FreeLM: Fine-Tuning-Free Language Model
Li, Xiang, Jiang, Xin, Meng, Xuying, Sun, Aixin, Wang, Yequan
Pre-trained language models (PLMs) have achieved remarkable success in NLP tasks. Despite the great success, mainstream solutions largely follow the pre-training then finetuning paradigm, which brings in both high deployment costs and low training efficiency. Nevertheless, fine-tuning on a specific task is essential because PLMs are only pre-trained with language signal from large raw data. In this paper, we propose a novel fine-tuning-free strategy for language models, to consider both language signal and teacher signal. Teacher signal is an abstraction of a battery of downstream tasks, provided in a unified proposition format. Trained with both language and strong task-aware teacher signals in an interactive manner, our FreeLM model demonstrates strong generalization and robustness. FreeLM outperforms large models e.g., GPT-3 and InstructGPT, on a range of language understanding tasks in experiments. FreeLM is much smaller with 0.3B parameters, compared to 175B in these models.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (10 more...)
General Cross-Architecture Distillation of Pretrained Language Models into Matrix Embeddings
Galke, Lukas, Cuber, Isabelle, Meyer, Christoph, Nölscher, Henrik Ferdinand, Sonderecker, Angelina, Scherp, Ansgar
Large pretrained language models (PreLMs) are revolutionizing natural language processing across all benchmarks. However, their sheer size is prohibitive for small laboratories or for deployment on mobile devices. Approaches like pruning and distillation reduce the model size but typically retain the same model architecture. In contrast, we explore distilling PreLMs into a different, more efficient architecture, Continual Multiplication of Words (CMOW), which embeds each word as a matrix and uses matrix multiplication to encode sequences. We extend the CMOW architecture and its CMOW/CBOW-Hybrid variant with a bidirectional component for more expressive power, per-token representations for a general (task-agnostic) distillation during pretraining, and a two-sequence encoding scheme that facilitates downstream tasks on sentence pairs, such as sentence similarity and natural language inference. Our matrix-based bidirectional CMOW/CBOW-Hybrid model is competitive to DistilBERT on question similarity and recognizing textual entailment, but uses only half of the number of parameters and is three times faster in terms of inference speed. We match or exceed the scores of ELMo for all tasks of the GLUE benchmark except for the sentiment analysis task SST-2 and the linguistic acceptability task CoLA. However, compared to previous cross-architecture distillation approaches, we demonstrate a doubling of the scores on detecting linguistic acceptability. This shows that matrix-based embeddings can be used to distill large PreLM into competitive models and motivates further research in this direction.
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Netherlands > Gelderland > Nijmegen (0.04)
- Europe > Germany (0.04)
Learning to classify complex patterns using a VLSI network of spiking neurons
Mitra, Srinjoy, Indiveri, Giacomo, Fusi, Stefano
We propose a compact, low power VLSI network of spiking neurons which can learn to classify complex patterns of mean firing rates online and in real-time. The network of integrate-and-fire neurons is connected by bistable synapses that can change their weight using a local spike-based plasticity mechanism. Learning is supervised by a teacher which provides an extra input to the output neurons during training. The synaptic weights are updated only if the current generated by the plastic synapses does not match the output desired by the teacher (as in the perceptron learning rule). We present experimental results that demonstrate how this VLSI network is able to robustly classify uncorrelated linearly separable spatial patterns of mean firing rates.
- North America > United States > New York (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > San Mateo County > San Mateo (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
Learning to classify complex patterns using a VLSI network of spiking neurons
Mitra, Srinjoy, Indiveri, Giacomo, Fusi, Stefano
We propose a compact, low power VLSI network of spiking neurons which can learn to classify complex patterns of mean firing rates online and in real-time. The network of integrate-and-fire neurons is connected by bistable synapses that can change their weight using a local spike-based plasticity mechanism. Learning is supervised by a teacher which provides an extra input to the output neurons during training. The synaptic weights are updated only if the current generated by the plastic synapses does not match the output desired by the teacher (as in the perceptron learning rule). We present experimental results that demonstrate how this VLSI network is able to robustly classify uncorrelated linearly separable spatial patterns of mean firing rates.
- North America > United States > New York (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > San Mateo County > San Mateo (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
Learning to classify complex patterns using a VLSI network of spiking neurons
Mitra, Srinjoy, Indiveri, Giacomo, Fusi, Stefano
We propose a compact, low power VLSI network of spiking neurons which can learn to classify complex patterns of mean firing rates online and in real-time. The network of integrate-and-fire neurons is connected by bistable synapses that can change their weight using a local spike-based plasticity mechanism. Learning is supervised by a teacher which provides an extra input to the output neurons during training. The synaptic weights are updated only if the current generated by the plastic synapses does not match the output desired by the teacher (as in the perceptron learning rule). We present experimental results that demonstrate how this VLSI network is able to robustly classify uncorrelated linearly separable spatial patterns of mean firing rates.
- North America > United States > New York (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > San Mateo County > San Mateo (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
Sigma-Pi Learning: On Radial Basis Functions and Cortical Associative Learning
Mel, Bartlett W., Koch, Christof
The goal in this work has been to identify the neuronal elements of the cortical column that are most likely to support the learning of nonlinear associative maps. We show that a particular style of network learning algorithm based on locally-tuned receptive fields maps naturally onto cortical hardware, and gives coherence to a variety of features of cortical anatomy, physiology, and biophysics whose relations to learning remain poorly understood.